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THE LINE-OF-SIGHT PROXIMITY EFFECT AND THE MASS OF QUASAR HOST HALOS 
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ABSTRACT 

We show that the Lyman-a (Lya) optical depth statistics in the proximity regions of quasar spectra 
depend on the mass of the dark matter halos hosting the quasars. This is owing to both the overdensity 
around the quasars and the associated infall of gas toward them. For a fiducial quasar host halo mass 
of 3.0 ± 1.6 hr 1 x 10 12 M©, as inferred by Croom et al. from clustering in the 2dF QSO Redshift 
Survey, we show that estimates of the ionizing background (T bkg ) from proximity effect measurements 
could be biased high by a factor of s» 2.5 at z = 3 owing to neglecting these effects alone. The 
clustering of galaxies and other active galactic nuclei around the proximity effect quasars enhances the 
local background, but is not expected to skew measurements by more than a few percent. Assuming 
the measurements of T bkg based on the mean flux decrement in the Lya forest to be free of bias, 
we demonstrate how the proximity effect analysis can be inverted to measure the mass of the dark 
matter halos hosting quasars. In ideal conditions, such a measurement could be made with a precision 
comparable to the best clustering constraints to date from a modest sample of only about 100 spectra. 
We discuss observational difficulties, including continuum flux estimation, quasar systematic redshift 
determination, and quasar variability, which make accurate proximity effect measurements challenging 
in practice. These are also likely to contribute to the discrepancies between existing proximity effect 
and flux decrement measurements of T bkg . 

Subject headings: Cosmology: theory, diffuse radiation — methods: numerical, statistical — galaxies: 
halos — quasars: general, absorption lines 



1. INTRODUCTION 

As the geometrical properties and initial conditions 
of the Universe ar e becoming fairly well known (e.g., 
iSpergel et jfl joog), an increasingly important aspect of 
cosmology concerns the emergence and evolution of com- 
plex, non-linear structures. Light sources — stellar and 
quasistellar - are tracers of these structures and it is thus 
of prime interest to study their nature and evolution. A 
measure of the stellar and quasistellar activity is given 
by the amount and distribution of photons with energy 
above the hydrogen ionization potential in the intergalac- 
tic medium (IGM). The relevant integral quantity is the 
redshift (z)-dependent background hydrogen photoion- 
ization rate, T bkg (z), which measures the integrated con- 
tributions from all sources of ionizing photons. Several 
measurements of T bkg exist in the literature (see Figure 
[U complemented by the numerical values tabulated in 
Table [5]of Appendix [A")) . Unfortunately, the agreement 
between various studies using different methods is poor. 

Two main methods have been used to estimate 
T bkg : the flux decrement method and the line-of- 
sight prox imity effect method . In the flux dec r ement 
method dRauch et alJ Il997t ISongaila et all 119991 : 



cosmic baryon density in units of the critical den- 
sity and h = i?o/100 km s~ x Mpc -1 , is obtained by 
requiring the mean flux decrement (D = (1 — e~ T )) 
in a cosmological simulation to agree with the mean 
decrement obse rved in quasar spect r a. In the proximity 
effect me t hod (ICarswell et alj|1987t IBaitlik et al.lll98S 



119911: iKulkarni fc Fall Tl993l 



Lu et al.1 . 

1994 ICristiani et al.1 119951: iGiallongo et alJ 
Srianand fc Khard 119961 IScott et al.1 l2000f ). in which 
we will be most interested here, one focuses on the 
"proximity region" near the quasar, where the ionizing 
flux emitted by the quasar itself is comparable to 
the background. (For rela ted work on the tra n sverse 
proximity effect see, e.g.. ISchirber et all 120041: ICroftJ 
12004 lAdelbergerll2004 iHennawi fc Prochaskall2006t ) 

In the simplest proximity effect model, the quasar lies 
in a random region of the Universe and the hydrogen gas 
is in photoionization equilibrium. Neglecting all motion 
of the gas, the Lya optical depth at any point is then 
given by 

T off 

rP ro X = (1) 

1 + Lo(r) 



McDonald fc Miralda-Escudel 120011: iMeiksin fc White! 
20031: iTvtler et al.l 120041: iBolton et all WM . 
Kirkman et al.1 120051 : Uena et all 12005) . one consid- 



where u(r) = rQ so (r)/T bkg , TQ so (r) is the contribu- 
tion to the photoionization rate owing to the quasar it- 
self at proper distance r from it, and is the op- 
tical depth that would be obtained if the quasar were 



ers the portion of the Lyman— a (Lya) forest along 
lines of sight to quasars (QSOs) away from the lat- 
ter. The parameter fi cx nlh 3 /T bkg , where Q b is the 
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turned off. Near the quasar, r^ so 



oc r 



(and hence 



u>) is large, causing a statistical decrease in observed 
Lya absorptio n. Authors h a ve gen erally fitted a model, 
introduced by IBaitlik etal] (|1988l ) (hereafter BDO), for 
the variation of the density of absorption lines near 
the quasars from spectra to estimate T b kg . Integra- 
tion over known sources of radiation (e.g.. iSteidel et all 
[2001 : Hopkin s~et~aT]|2006dh can also be used to estimate 
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Fig. 1. — Existing measurements of T bkg using the flux decrement (red, circles) and the proximity effect methods (green, squares). The 
horizontal error bars, where present, indicate the redshift range over which the measurement applies. The vertical bars show the reported 
uncertainties. The early proximity effect estimate r bkg > 7.8 X 10 12 s _1 at z = 3.75 of Carswcll et al. ( 1987) is not shown. In the redshift 
range where measurement have been made using both methods, the proximity effect estimates are generally higher by a factor of 2 — 3. The 
scatter between the different proximity effect measurements is also larger than in the flux decrement case. Numerical values, references, 
and additional comments are provided in Table [2] 



T bkg . However, this method provides only lower bounds 
and much of the interest in radiation backgrounds lies 
in determining whether the measured values agree with 
the summed contribution of resolved sources. Thus, 
T bk9 should be measured independently of such estimates 
and we will not consider them further in this paper. 

The proximity effect method has generally yielded 
measurements of T g higher than flux decrement esti- 
mates (Figure [1]), suggesting unmodeled biases in one 
or both methods. There are indeed potential sources of 
bias in the proximity effect method, mostly owing to the 
peculiar environments in which quasars are thought to 
reside, that have not generally been modeled in proxim- 
ity effect analyses (although seelLoeb fc Eisensteinlll995l 
iRollinde et aUl2005l iGuimaraes et al.ll2007l and Sj9]for a 
comparison with previous work). These potential biases 
are a focus of this paper. 

In cold dark matter (CDM) models of structure for- 
mation, quasars are expected to prefere ntially form in 
overdense regions of the universe (e.g., ISpringel et al.l 
2005), violating the assumption that they lie in ran- 
dom locations. The overdense environments of quasars 
not only affect the local density of the absorbing gas, 
but also its peculiar velocity field. General infall of the 
gas tow ard the density peaks wh e re quasars resi de is ex- 
pected (|Loeb k, Eisensteir] 119951 : iBar kana 20 01) and in 
fact may have been directly detected (|Barkana fc Loebl 
120031) . Moreover, galaxies and other active galactic nuclei 
AGN) should pr eferentially form in overdense regions 
Bond et"aT1ll99n) and therefore cluster around quasars. 
The local "background" flux near the quasar may be 



more intense than average. Thermal broadening owing 
to the finite temperature of the gas also causes redshift- 
space distortions, increasing the equivalent width of ab- 
sorption lines. 

The flux decrement method is also not guaranteed to 
be unbiased, most importantly as it requires estimation 
of the unabsorbed continuum level. This is difficult to do 
reliably, especially at high redshifts where unabsorbed 
portions become increasingly rare, if existent at all, in 
quasar spectra. Alternatively, one can estimate the un- 
absorbed conti nuum level by extrapol ating from redward 
of Ly-a (e.g., iPress fc Rybickil Il993h . However, there 
may be a break in the quasar contin uum close to Ly-a 
(jZheng et al.lll997t iTelfer et al.ll2002f) . whic h could bias 
resul ting estimates of the flux decrement (jSeliak et al.l 
.2003). For further discussion of flux decrement esti- 
mation and related issues see, e.g., iSeliak et alJ [2003, 



iTvtler et al.ll2004bl and lLidz et al.ll2006ai 

The flux decrement method is also sensitive to assumed 
cosmological parameters, most notably Oj,, through the 
factor flbh 2 in /i. If an independent measurement of 
T bkg is available, then the flux decrement may be used to 
infer Oj, . The proximity effect provides such a measure- 
ment and, as integration over known sources can only 
provide lower bounds on Y bkg , is the only known way to 
use the flux decrement to obtain a full measurement of 

Integration over sources from a luminosity at rest- 
frame frequencies below the Lyman limit requires knowl- 
edge of the fraction of ionizing photons which escape the 
immediate environment of the emitters and actually have 
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an effect on the surr ounding fGM. As recently shown by 
Shapl ev" =t al.1 ()2006l ). the escape fraction of Lyman break 
galaxies (LBG) at z ~ 3, whose contribution may well 
dominate T bkg at these redshifts (jSteidel et alj 1200 ll ). 
is highly uncertain. Agreement between measurements 
of 0& from light element abundances inter preted in the 
conte xt of Big Bang nucleosynthesis (e.g., iBurles et al.l 
cosmic micr owave background anisotropies (e.g. 



Spergel et al J 12006). and the flux decrement would pro- 



vide a powerful consistency check of the standard cosmo- 
logical model as well as of the g ravitational paradigm for 
the Lya forest (|Hui et al J 12002( 1 . 

A detailed understanding of the proximity effect would 
also provide insight into the astrophysics of quasar envi- 
ronments and in turn could be used to study these. For 
example, if the dark matter (DM) halos in which quasars 
lie can be shown to induce significant biases in measure- 
ments of T bkg , then one could parameterize these biases 
with respect to the quasar host halo mass, Mom- Knowl- 
edge of the quasar host halo mass and constraints on its 
dependence on quasar luminosity would be tremendously 
useful in determining how activity is triggered in AGN. 
This is especially true in a picture in which activity is 
related to mergers (e.g.. lHopkins et aT1l2006aHLidz et all 
l2006bf) . as such events inevitably increase the host halo 
mass. 

Recent clustering measurements sugges t a universal 
quasa r host halo mass. For example, ICroom et al.l 
(2005) find that quasars lie in dark matter halos of 
mass 3.0 ± 1.6 h^ 1 x 10 12 M© based on an analysis of 
the quasar two-point correlation function as measured 
from 2dF QSO Redshift Survey (2QZ) quasars, only 
weakly dependent on redshift and with no evidence for 
a luminosity dependence. If confirmed, this result would 
provide constraints on quasar evolution and call for an 
explanation of the physical origin of this critical mass 
for nuclear activity. Given that halos continuously grow 
through mergers, a universal mass for AGN activity also 
constrains quasar lifetimes. Combined with a relation 
between the mass of the central black hole {Mb h) and 
the mass of its host halo (e.g., iFerraresa 12002( 1 . a uni- 
versal halo mass would also suggest a universal black 
hole mass, which may be at odds with the variety of 
nucle ar black hole masses observed in the local universe 
(e.g.. iKormendv fc~ R.ichstone] |1995[ ) if most galactic nu- 
clei have gone through an active phase, although it is pos- 
sible that the Mbh — Mom relation evolves with redshift 
(Hopkins et al. 2006c, Hopkins et al., in preparation). If 
the universal quasar host halo mass is correct, it would 
provide evidence for a redshift-dependent Mbh — Mom 
relation. As practically all the evidence for the quasar 
host halo mass universality is inferred from clustering 
measurements, independent quasar host halo mass esti- 
mates are clearly desirable. 

In this work, we quantify the effects of quasar host 
halos on the Lya statistics in the proximity regions of 
quasars and the biases they induce in proximity effect 
measurements of T bkg (Sj4] and $5|), develop a Monte 
Carlo-based method for making unbiased T bkg measure- 
ments O, and demonstrate how the analysis can be in- 
verted to use the proximity effect to measure the masses 
of dark matter halos hosting quasars (©. Throughout 
most of the paper, we assume that the analyses are per- 



formed in ideal quasar spectra. In EjSj we consider obser- 
vational difficulties which make accurate proximity effect 
measurements challenging in practice and which may also 
contribute to the discrepancies between proximity effect 
and flux decrement measurements of T bk9 . We compare 
with previous work in $9] and conclude in i (I0l We begin 
by summarizing conventions used throughout the paper 
in Sj2] and describing our numerical framework in £|3j 

Most previous studies of the proximity effect were 
based on a picture of the Lya forest as arising from 
absorption by discrete gas clouds and on counting ab- 
sorption lines. Here, we adopt the modern picture of 
the forest as arising from absorption by smooth den- 
sity fluctuations imposed on the warm photoionized IGM 
as a natural consequence of hierarchical structure for- 
mation within CDM models. This picture, based on 
a combinat i on of detailed hydrodynamical simulations 
(ICen et al.l 11994 IZhang et al.l 119951: iHernquist et al l 
19961: iKatz et al.l [19961: iMiralda-Escude et al.l 119961: 



Theuns et afl 119981 : iDave et al.1 11999ft and analyses 
of high-resolution high signal-to-noise quasar spec- 
tra (ILu et al.1 119961: iKirkman k. Tvtleri 119971 : iKim et all 



12002 1 with matching properties, is strongly supported 
by all available evidence. 

As we were final izing this work we became aware of 
a related study bv IKim fc Croft) (|2007f l. These authors 
also consider constraining the mass of quasar host ha- 
los from the Lya forest. They, however, focus on using 
quasar pairs and work under the hypothesis (supported 
by observations) that the transverse proximity effect is 
negligible. We, on the other hand, focus on the line-of- 
sight proximity effect and model it in detail. The two 
works, while broadly consistent where there is overlap, 
are hence complementary. 

2. CONVENTIONS 

In this paper, X denotes a random variable, X a re- 
alized value, and /^(A) the probability density func- 
tion (PDF) of X evaluated at X. The linear overden- 
sity 5 is defined such that p/p = 1 + S, where p is the 
local mass density and p is the mean density of the Uni- 
verse. We assume a flat ACDM cosmology similar to 
the one inferred from Wilkinson Microwave Anisotropy 
Probe (WMA P) measurements o f the cosmic microwave 
background (|Spergel et al.l 120061 ): the exact parameters 
assumed in our cosmological simulations are given in Ta- 
ble [T] We select "typical" quasars to have spectral in- 
dexes (L v oc v~ a ) a = 1.57 blueward of Lya. For the 
typical specific luminosity at the Lyman limit, we take 
the median log 10 (Lgi2/erg s _1 cm -2 Hz -1 ) = 31.1 for 
the qu asars in the proximity effect analysis of lScott et al.l 
(2000). When we refer to quasars with luminosity one 
standard deviation from typical, we approximate (g en- 
erously) the standard deviation in IScott et all (|2000ft to 

3. NUMERICAL SIMULATIONS AND MOCK SPECTRA 

To study the effects of various potential sources of bias 
on the measurement of T bkg from the proximity effect, 
we will make use of mock quasar spectra generated from 
cosmological simulations. In generating the mock spec- 
tra, we will turn on and off different effects, examining 
their impact on the derived T bkg . In this section, we de- 
scribe the simulations that we use and how basic mock 
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TABLE 1 
Simulation Parameters 



JV— body Dark Matter Simulation 
Parameter 



Code 
Box Side Length 
Boundary Conditions 
Number of Particles 
Particle Mass 

n b 

h 
<?8 



GADGET-2 a 
100 h~ 1 comoving Mpc 
Periodic 



256 J 
4.96 x 10 9 h- 1 
0.3 
0.7 
0.04 
0.7 
0.9 
1 



M : 



Pricnds-of-Priends Halo Pinder 



Linking Length 
Minimum Number of Particles 



0.2 (Ar p ) b 
31 



Grid 



Grid Interpolation 
Line of Sight Interpolation 



CIC on 256^ grid points 
TSC 



a ISpr ingcl (2005). b Mean interparticle spacing. 

spectra are constructed from them. The reader who is 
not concerned with such numerical details is encouraged 
to skip to 21 where we begin our discussion of the effects 
of overdensities and redshift-space distortions. 

3.1. Cosmological Dark Matter Simulations 

The first step in our modeling of the Lya forest is to 
compute the dark matter density and velocity fields in 
a simulation box at different redsh ifts. We do so us- 
ing the GADGET— 2 (|Springe]||2005l ) cosmological code. 
The simulation parameters are given in Table [TJ When 
generating density and velocity profiles and constructing 
optical depth PDFs in the following sections, significant 
numbers of halos in different mass ranges are required. 
This motivates our choice of a large simulation box, with 
side length 100 /i -1 comoving Mpc. In order to com- 
pare our results with existing proximity effect analyses, 
it is also necessary to run the simulation to redshifts at 
least as low as z « 2. We use TV-body outputs with 
256 3 particles at z = 2, 3, and 4; the least massive re- 
solved halos have mass 4.96 h^ 1 x 10 9 M . Ideally, for 
a detailed comparison with Lya forest data, we would 
use a higher resolutio n, fully hydrodynamic simulation 
(e.g.. IViel et al.|[2006l ). Our present A-body approach 
should, however, be adequate for investigating the rela- 
tive impact of quasar environments on Lya absorption 
statistics. 



3.2. Mock Quasar Spectra from Dark Matter 
Simulations 

In the simplest model, the neutral hydrogen (HI) den- 
sity at any given point in the Universe is given by pho- 
toionization equilibrium. Let T tot (s _1 ) be the total HI 
photoionization rate per atom, R(T) (cm 3 s^ 1 ) the re- 
combination rate, nui the HI number density, tihii the 
ionized hydrogen (HII) number density, and n e be the 
free electron number density in proper units. Then equi- 
librium requires 



r tot n H i = R(T)n e n ml . 



(2) 



For a small ionized fraction xi = rimi/i-Hi <C 1 (which is 
certainly true in the IGM at the redshifts of interest here, 



z < 6, as indicated by the absence of a Gunn-Peterson 
in the known quasars at these redshifts), 



"HI 



R{T)n 2 tot 

ptot : 



(3) 



where n to t = "hi + "hii ~ ^bA/fp c /m p is the total hy- 
drogen number density. We t ake the cosmic hyd rogen 
mass fraction X H = 0.75 (e.g.. iBurles et al.lll999f ). The 
above expression for nm neglects the helium contribution 
to the free electron density. For fully singly ionized he- 
lium, we make an error of 8% on n e and hence on nm; for 
fully doubly ionized helium, the error is doubled. Simpli- 
fying approximations such as this one have no impact on 
our discussion, which should be relatively independent of 
the details of the cosmology. For the recombination rate, 
we use the approximate expression 



R(T) = 4.2 x 10 



-13 



T 



10 4 K 



-0.7 



3 —1 

cm s , 



(4) 



(|Hui fc Gnedinlll997h where T is the local gas tempera- 
ture. 

For S < 5, iHui fc Gnedinl (|1997t ) derived the equation 
of state 

r = T„(l + (5) 

for the IGM, where Tq(z) is the temperature of a fluid 
element which remains at the cosmic mean density and 
P(z) parameterizes the density dependence. Detailed ex- 
pressions for T and /3, which we use in our numerical 
calculations, are given in Appendix [B] To order of mag- 
nitude, To nu 10 4 K and in the limit of early reioniza- 
tion (zreion <^ 10, consistent with recent measurements 
of the Thomson optical depth to reionization by Page 
et al. 2006), (3 — > 0.62. Temperature measurements 
from the z ~ 3 Lya forest favor slightly larger tem- 
perat ures and a slightly les s steep density dependence 
(e.g.. iMcDonald et all [200lh . possibly owing to recent 
Hell reionization. Our basic conclusions should, how- 
ever, not depend on the precise temperature-density re- 
lation assumed. 

For a quasar with isotropic specific luminosity L u = 
Lq{v /vLy)~ aQ , where VLy is the hydrogen ionization fre- 
quency in its ground state (the Lyman limit) and ctQ is 
the spectral index, T tot = TQ so (L Q ,a Q ;r) + T bk s(z), 
where 



/•oo 

TQ so (L Q ,a Q ;r)^ / dva{v) 

J VT.n, 



AqLq 
/ Lnhp(aQ + 3)r 2 



(G) 
(7) 



Here, hp is Planck's constant and the hydrogen pho- 
toionization cross section is approximately given by 

-3 

(8) 



a{v) = A — 



for 



near v^y, wher e A = 6.30 x 10 s cm 2 
(jOsterbrock fc Ferlandl 120061 ). In equation ©, we 
assume that the proper distance to the quasar, r, is 
sufficiently small that the point is effectively at the same 
redshift as the quasar, z « zq. 

The A— body dark matter simulation provides the po- 
sition and velocity of each particle in the box. These are 
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interpolate d on a uniform grid using a c loud-in-cell (CIC) 
algorithm (Hocknc y fc Eastwoodll 19881 ) to construct den- 
sity and velocity fields. Triangular-shaped cloud (TSC) 
interpolation is then used to smoothly project the fields 
onto sight lines which are drawn through the box with 
randomly selected orientations. We obtain fields 6(r) and 
v \\ ( r ) ( v \\ > indicates motion toward the observer) for 
the density and velocity along the line of sight, respec- 
tively, as a function of proper distance from the quasar. 
From the the density field, we compute the corresponding 
temperature field using the equation of state ([5]). Given 
quasar properties Q = {zq 7 Lq^ccq}, T^ so is computed 
at each pixel using equation [6] For simplicity, in £[5] - 
£[7] all si mulated quasars are given the median luminosity 
from the lScott etaD (|200Ct ) sample (see g2j. 

We assume that the baryons trace the dark matter. 3 
Although no Jeans-scale smoothing filter is explicitly ap- 
plied, our simulation has resolution comparable to the 
Jeans scale (~40 km s _1 and ^20 km s _1 , respectively, 
at z = 3) and we do model thermal broadening in our 
mock spectra (see 94.2]) . The HI number density along 
the line of sight is then given by equation (]3|). Ignor- 
ing redshift-space distortions for the moment and for a 
negligible line width, the Lya optical depth at any given 
point is 

t _ ne 2 f Lya nra , , 

m e H u Lya v / fJ m (l + z) 3 + fiT 

where fhya is the oscillator strength and VL ya is the fre- 
quency of the Lya transition. Here, the optical depth 
as a function of distance from the quasar, r(r) (equiva- 
lently, the transmission coefficient, e~ r ), is the quantity 
of interest and this is what we will refer to as a "spec- 
trum." 

Figure [2] shows examples of mock spectra constructed 
from the simulation. Figure [3] shows the transmission 
coefficient as a function of distance from the quasar av- 
eraged over ensembles of 1000 quasars at z = 2, 3, and 4. 
The increase in the mean transmission near the quasars 
illustrates the proximity effect. We note that the effect 
is more pronounced at higher redshifts, even for fixed 
T bkg and , owing to the increase in the neutral hy- 
drogen density attributed to the cosmological expansion. 
At low redshifts, the IGM is more dilute and hence the 
transmission is high even away from quasars. This sug- 
gests, as corroborated by our analysis in £15.51 that it 
is easier to obtain constraints on T bkg from the proxim- 
ity effect at higher redshifts. On the other hand, bright 
quasars are rarer, and continuum fitting is more difficult, 
at high redshift. 

4. OVERDENSITIES, INFALL, AND CLUSTERING AROUND 

QUASARS 

After setting up the basis of our numerical framework 
in the previous section, we are now ready to discuss the 
effects of overdense quasar environments on mock spec- 
tra. We describe how we model matter overdensities in 
£14.11 and the redshift-space distortions owing to gas in- 
fall in £14.21 We then pause to take a look at how the 

3 See e.g., IMeiksin fc White! (200ll ) and IViel et"aTl p006h . for 

comparisons between Lya forest statistics extracted from dark 
matter, pseudo-hydrodynamic, and fully hydrodynamic simula- 
tions. 
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Fig. 2. — Examples of mock spectra (shown as the transmission 
coefficient as a function of the redshift-space distance from the 
quasar) constructed from the simulation at at z = 2, 3, and 4 
(top, middle, and bottom panel). In each case, the quasar has 
typical luminosity and r bkg = I0~ 12 s~ x . The local overdensities 
in which the quasars reside and the redshift-space distortions are 
modeled as in i|4.1l and i|4.2l 

optical depth PDFs are affected by the overdensities and 
redshift-space distortions associated with massive dark 
matter host halos in £14.31 The biases they induce on mea- 
surements of T bkg are given full consideration later in £j5l 
after describing our likelihood formalism. We estimate 
the effect clustering of galaxies and other AGN around 
proximity effect quasars on measurements of T bkg in £J4.4] 
concluding that that it is relatively unimportant. 

4.1. Overdensities 

Structures in the Universe such as galaxies and quasars 
form in the collapse of regions of the Universe with mean 
density exceeding the critical value necessary to over- 
come the Hubble flow. Such regi ons preferentially arise 
in larger-scale overdensities (e.g.. iBond et al.lll991| ). In 
addition, the density enhancement around collapsed ob- 
j ects generally extends well beyo nd their virial radius 
(|Barkanal r2004 iPrada et alJl2006t ). As a consequence, 
quasars are expected to be surrounded by an excess of 
absorbing gas with respect to a random location in the 
Universe. In fact, there is strong and growing evidence 
from clustering measurements that quasars reside in mas- 
sive dark matter halos. Using AGN- AGN clustering mea- 
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Fig. 3. — Transmission coefficient as a function of redshift- 
space distance from the quasar averaged over samples of 1000 mock 
spectra for quasars of typical luminosity and T bk9 = 10 — 12 s -1 at 
2 = 2, 3, and 4 (top, middle, and bottom panel). The proxim- 
ity effect, seen as an increase in the mean transmission near the 
quasars, is stronger at higher redshifts where the absorbing gas is 
denser owing to the cosmological expansion. The local overdensi- 
ties in which the quasars reside and the redshift-space distortions 
are modeled as in i|4,ll and £|4.2I In each case, the curves were 
smoothed with a boxcar of width 0.2 comoving Mpc. 

surements from the 2dF QSO Redshif t Survey (2QZ) at 
0.8 < z < 2.1, iPorciani et alj (|2004f ) find a minimum 
mass for a DM halo to host a quasar of 10 12 M Q , with 
a characteristic mass ~ 10 13 M B . This result h as been 
recently co nfirmed by | Porciani fc Norberd (f2006h . 

Similarly, iWake et all (|2004f) conclude, from a sample 
of narrow-line AGN from the Sloan Digital Sky Survey 
(SDSS) Data Release 1, that the minimum host DM halo 
mass is 2 x 10 12 M Q . Fr om 2QZ AGN-AGN clustering 
data at 0.3 < z < 2.2, ICroom et all (|2005l ) find that 
quasars lie in DM halos o f mass 3.0 ±1.6 h ~ x x 10 12 M , 
regardless of luminosity. ICoil et all (|2006[ ) measure the 
AGN-galaxy cross-correlation function at 0.7 < z < 1.4, 
using data from the SDSS and Deep Extragalactic Evolu- 
tionary Probe (DEEP) 2 survey and find a minimum DM 
halo mass ~5x 10 11 M Q , with mean 3 x 10 12 M Q , again 
with no evidence for luminosity dependence. The lack of 
dependence of clustering (and hence of DM halo mass) 
on quasa r luminosity has been confirm ed at higher red- 
shifts by lAdelberger fc Steidel (|2005aD . who show that 



the AGN-galaxy cross-correlation length at 1.8 < z < 3.5 
is con stant over a rang e of 10 optical magnitudes. Fi- 
nally, iBarkana fc Loebl (|2003l ) estimate the DM host 
halo masses of the bright quasars SDSS1122 — 0229 at 
z = 4.795±0.004 and SDSS1030 + 0524 at z = 6.28±0.02 
to be 2.5 x 10 12 M Q and 4.0 x 10 12 M Q , respectively, in- 
dependently ofchtstenngbased^on the spectral signature 
of gas infall. I Hopkins" "etaTI (1200631 compare clustering of 
quasars and galaxies as a function of luminosity, redshift, 
and color and find that the clustering of local ellipticals 
is in accord with models which a ssociate quasar activi ty 
with the formation of spheroids (Hopkins et~alll2006bf) . 

We model the overdensities in which quasars reside by 
putting each mock quasar at the center of mass of a DM 
halo with mass Mum in a specified range [M m i n , M max ] 
randomly chosen in the simulation box. To identify ha- 
los in the box, we use a friends-of-friends algo rithm (e.g., 
iDavis et al.lll985t ISpringel fc HernquistJ[2003l ) with link- 
ing length b = 0.2 x (Ar p ), where the mean interparticle 
spacing (Ar p ) = L/N for a box with side length L and 
N 3 particles. The mass of the halo is then the sum of 
the particle masses. 

Using this definition, [Jenkins et al.l (|2001h obtained 
a universal mass function. Assuming that halos are 
isothermal spheres, this corresponds to a mean density 
inside the halo of 180 times the background density, al- 
though in_£ractice there is a large scatter about this 
value (jWhitel I2002D . For our 100 h" 1 comoving Mpc 
box with 256 3 particles, b s» 0.08 h,- 1 comoving Mpc. 
We require that each halo contains at least 31 par- 
ticles. Figure [4] shows the mean overdensity profiles 
of dark matter halos hosting quasars for the fiducial 
mass range 3.0 ± 1.6 hr 1 x 10 12 M^ inferred from clus- 
tering measurements by ICroom et al.l ((2005), with the 
corresponding standard deviation at each point. Also 
shown are analogous curves for some of the least (fidu- 
cial mass range divided by ten, 3.0 ± 1.6 hr 1 x 10 11 M Q ) 
and most massive halos (fiducial mass range multiplied 
by two, 6.0 ± 3.2 hr 1 x 10 12 M©) in the simulation box. 
For comparison, the virial radius of a halo is given by 4 



42 



M DM 



10 12 /i^Mp, 



1/3 



1 + z 



kpc (10) 



(|Barkana fc Loebl [2001), which in all cases is much 
smaller than the radius up to which the halo profiles 
have a significant impact on the density field, given that 
most of the Lya forest arises from fluctuations of order 
unity. As we show the next section, the halo infall regions 
extend to even larger distances. 

4.2. Infall and Redshift- Space Distortions 

In general, neither the quasar nor the absorbing gas is 
at rest with respect to the Hubble flow. In particular, 
quasar host halos grow via the infall of matter toward 
their centers. This is illustrated in Figure [5l in which 
we show profiles of relative velocity between the quasar 
and the absorbing gas computed from the simulations. 
The peculiar velocities have the consequence of shifting 
the effective redshift of Lya absorption of gas parcels 
through the Doppler effect. 

4 This expression is valid at z > 1, where the effect of a cosmo- 
logical constant can be neglected. 
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Fig. 4. — Profiles of linear overdensity of dark matter halos at 
2 = 2, 3, and 4 (top, middle, and bottom). In each case, 100 lines 
of sight were drawn from each of 100 halos with mass in the ranges 
3.0 ± 1.6 h.- 1 x 10 11 Mg (light, blue), 3.0 ± 1.6 h~ x x 10 12 M 
(Croom et al. 2005, black), and 6.0 ± 3.2 h' 1 X 10 12 Mq (heavy, 
red) randomly selected in the simulation box. The solid curves 
show the profiles averaged over all lines of sight and the bounding 
dashed curves show the sample standard deviations of the density 
at each point. The profiles extend significantly above the mean cos- 
mic density (5 = 0) at distances greatly exceeding the halo virial 
radii (~ 50 proper kpc). We do not show the case of the most 
massive halos at z = 4 because the simulation box contains too 
few halos in this mass range. 

Consider the case where the quasar follows the Hubble 
flow and a gas parcel along the line of sight at actual 
proper distance r from the quasar has peculiar velocity 
tiii (> for motion toward the observer) along the line 
of sight. This case is sufficiently general for our purpose, 
since the peculiar motion of the quasar is simply taken 
into account by considering its redshift as determined 
from spectroscopy. The important quantity is the rela- 
tive velocity of the gas with respect to the quasar along 
the line of sight; in practice, the velocity of each mock 
quasar is identified with the velocity of the center of mass 
of the halo in which it lies. 

To first order, the gas parcel will absorb gas that has 
Lya frequency (in the quasar rest frame) at proper dis- 
tance r' = r + Ar from the quasar, where Ar = v\\/H. 
We call r' the "redshift-space" distance. Inserting rea- 
sonable numbers (zq — 3, U||=300 km s" 1 ; see Figure 



[5]), we obtain Ar w 1 proper Mpc. The Doppler shift 
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Fig. 5. — Profiles of relative velocity along the line of sight be- 
tween the gas and the quasar for halos at z = 2, 3, and 4 (top, mid- 
dle, and bottom). In each case, 100 lines of sight were drawn from 
each of 100 halos with mass in the ranges 3.0 ± 1.6 h~ 1 X 10 11 Mq 
(light, blue), 3.0 ± 1.6 h' 1 X 10 12 M (Croom et al. 2005, black), 
and 6.0 ± 3.2 h~ 1 X 10 12 Mg (heavy, red) randomly selected in 
the simulation box. The solid curves show the profiles averaged 
over all lines of sight and the bounding dashed curves show the 
sample standard deviation of the velocity at each point. Shown 
are the absolute values of the true mean velocities, which are neg- 
ative owing to systematic infall of the gas toward the halo centers. 
The velocities are given in units of Hr because in linear theory 
vu/Hr ~ <5 (c.f. Figure |4)l and this is the quantity with which the 
expected r bkg bias in proximity effect analyses scales ( i|5.4l l. We 
do not show the case of the most massive halos at z = 4 because 
the simulation box contains too few halos in this mass range. 

effect is thus potentially very significant. We incorpo- 
rate the effect of peculiar velocities in the usual man- 
ner, which we review here briefly for completeness (see 
e.g., iHui et alJ fl997l for more details). Since the ion- 
ized fraction of a given gas parcel depends on the pho- 
toionization rate at its actual proper distance r from the 
quasar, we first calculate the neutral hydrogen density 
field ignoring redshift-space distortions as in §3.21 Be- 
cause of discretization, we obtain a sequence of neu- 
tral densities {po, p\, pn}, with corresponding pe- 
culiar velocities {vn q,V\\\,...,v\\^ at proper distances 
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{ro, ri, Tn} from the quasar. For each i, we cal- 
culate the redshift-space distance r\ = n + An, with 
Ar; = v«i/H, for the gas parcel. The redshift-space 
density field {p' , p[, p' N } is then constructed such that 



Pi 



E 



pj 



(ii) 



r' >0 is nearest to Vi 



i.e. at each point i of the discretized line-of-sight field we 
sum the densities of the gas parcels whose redshift-space 
coordinate is positive (the absorption occurs along the 
line of sight to the quasar) and nearest to fj . 

The thermal motion of the gas particles also causes 
redshift-space distortions, broadening the absorption 
lines owing to the random velocities. The character- 
istic temperat ure of the IGM i s TrnM ~ 2 x 10 4 K 
(iRicotti et all 120001 : ISchave et al.ll2000t iMcDonald et all 
l2001f ). corresponding to typical Doppler thermal veloc- 
ities v rma » y/SkgTiGM /mp ~ 20 km s _1 . This is an 
order of magnitude less than the typical bulk veloc- 
ities of the gas parcels owing to peculiar velocities, 
suggesting that this effect has little direct impact on 
r 9 measurements from the proximity effect. However, 
it is important to simulate thermal broadening to accu- 
rately reproduce the statistical properties of the Lya for- 
est. 

The Doppler profile owing to thermal broadening is 
given by 



1 



exp 



Vhya) 



{Av D f 



with 



Av D 



is Lya 2k B T 



(12) 



(13) 



for Lya absorption b y hydrogen atoms (e.g., 
IRvbicki fc Lightmanl [l97l) . A shift A v in frequency 
corresponds to a shift Ar = cAv / Hu in effective proper 
position of absorption, so that in proper real-space the 
profile becomes 



(j){Ar) 



Hcu 



Lya 



y/nAlS D 



exp 



{Av D y 



(14) 



For each gas parcel we use the temperature T as given 
by the equation of state (JSJ) for the undistorted density 



field. For each j, we then assign a fraction . 



i)An 



of the neutral density of parcel i to the cell at position 
Tj , where Art is the proper spacing between neighboring 
cells. When modeling both peculiar velocities and ther- 
mal broadening simultaneously, the numbers of cells by 
which each partial gas parcel is shifted owing to peculiar 
motion and thermal broadening are added. Note that we 
assume a thermal profile here, and hence ignore the nat- 
ural line width, but this is accurate for the low column 
density Ly-a forest (e.g. JHui et al.lfl997h . 

4.3. Optical Depth PDF vs. Halo Mass 

At this point, it is interesting to take a look at 
how the optical depth PDF varies with halo mass at 
different distances from the quasars when the over- 
density and gas infall associated with dark matter 
host halos are modeled as above. This is illus- 
trated in Figure [6l in which we show results for dark 



matter halos of mass 3.0 ± 1.6 hr 1 x 10 11 M Q (light), 
3.0 ± 1.6 h~ l x 10 12 M (Groom et al. 2005), and 
6.0 ±3.2 hr 1 x 10 12 M (heavy) at z = 3, along with 
fits to the trend of mean lnr with halo mass for the 
range of masses probed by our simulation. We also show 
the case in which quasars lie in random locations, which 
can be clearly distinguished from the massive halos. The 
optical depth PDFs are shifted to higher values with 
increasing halo mass, which is expected since the mat- 
ter overdensity should increase with halo mass. The 
significant effect of quasar environments on the optical 
depth statistics in the proximity regions of quasars sug- 
gests that existing proximity effect analyses which have 
neglected these will not necessarily recover the correct 
Y bk 9 , We- make analytic estimates of the expected biases 
and quantify these more accurately with mock 



spectra in ^5.51 The environmental dependence of the 
optical depth statistics also raises the possibility of using 
the latter to probe quasar environments, in particular 
measure the mass of the host halos. We focus attention 
to this exciting possibility in SJ7] For the moment, we 
note that the effect of halos on the optical depth statis- 
tics is to "separate" the PDFs with mass. This is non- 
degenerate with varying T bk 9, which just shifts PDFs of 
lnr by the same amount, regardless of the halo mass 5 . 
We must note that although the optical depth statistics 
near quasars differ by large factors for different host halo 
masses, the absolute optical depth differences are small, 
of order At ~ 0.01. Measuring these differences in actual 
spectra, with noise and for which the continuum level 
must be estimated from the data, may be challenging in 
practice, as we discuss in $8] 

4.4. Clustering 

A possible bias related to overdensities is the cluster- 
ing of galaxies and other quasars around proximity effect 
quasars. If quasars indeed form in large-scale overdensi- 
ties, then the probability of other quasars and galaxies 
forming in the vicinity is increased with respect to av- 
erage regions of the Universe, because there it is easier 
for local density peaks ( " peaks within peak s" ) to cross 
the threshold for collapse ([Bond et al.lll991h . The clus- 
tering of galaxies around quasars has in fact been both 
predi cted in simulations of galaxy a nd quasar formation 
(e.g.. iKauffmann fc Haehneltl 120021 ) and observationally 
measured over a wide redshift interval (e.g., Croom et al. 
2004 at z < 0.3; Coil et al. 2006 at z ~ 1; Adelberger 
& Steidel 2005a at 1.5 < z < 3.5). Quasars, which 
are generally thought to be hosted by galaxies, have 
been shown to exhibit similar clustering among them- 
selves Ve.g.. ICroom et al.ll200ll 12001 iWake etaD 12001 
iCroom et afll2005h . 

We have thus far identified T bkg with the total pho- 
toionization rate near quasars contributed by all other 
light sources, implicitly assuming that this quantity is 
equal to the total photoionization rate averaged over 
large regions of the Universe (the "true" background 
rate). Since galaxies and quasars (which together pre- 
sumably dominate the contribution to the photoionizing 
background) cluster around quasars, we in fact expect 
T bkg near quasars to be higher than the true background 

5 Ignoring redshift-space distortions, r <x njji oc (p* ')- 1 (fj372j . 
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Fig. 6. — Optical depth PDFs at increasing redshift-space proper distance from quasars of typical luminosity at z = 3. In each 
case, the solid-line histograms show PDFs approximated from 10 lines of sight drawn from each of 100 halos with masses in the ranges 
3.0 ±1.6 h- 1 X 10 11 Mq (light, blue), 3.0 ±1.6 h" 1 X 10 12 M Q (Croom et al. 2005, black), and 6.0 ± 3.2 h' 1 X 10 12 Mq (heavy, red) 
randomly selected in the simulation box. The dashed orange PDFs show the case of quasars lying in random locations in the box. This 
situation can be clearly distinguished from the cases were the quasars lie in massive dark matter halos. Note, however, that although the 
optical depth statistics near quasars differ by large factors for different host halo masses, the absolute optical depth differences are small, 
of order At ~ 0.01. We discuss in §E\ observational difficulties which make such differences challenging to measure in practice. In general, 
the optical depth PDFs are shifted to higher values as the halo mass is increased. Below each panel, we show how the sample mean lnr 
varies with mass for 10 lines of sight from each of 10 halos in bins of logarithmic width A In MnM = 0.05 (10 times finer than the fiducial 
Croom et al. mass range) along with the least-squares log-log linear fit. The slope of the linear fit compared to the scatter between the 
dots provides an estimate of how well halos of different masses can be distinguished with 100 lines of sight, using only one data point at 
the given distance from each spectrum. The results of full mock mass likelihood analyses are presented in j(7] 
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T bkg . In this section, we quantitatively estimate this ef- 
fect. 

Let us first suppose that the ionizing background is 
dominated by emission fro m star-forming LB Gs, as may 
be the case at z ~ 3 (e.g., Stei del et al.ll20 01). Consider 
the photoionizing flux at a given position owing to all 
sources of light in the Universe other than the quasar in 
an idealized model of Nq isotropic sources, each with spe- 
cific luminosity L„.g, distributed in a spherical volume 
V — 47ri? 3 /3 with quasar-galaxy correlation function 



(15) 



Cosmological expansion can be neglected in the argu- 
ment, since for > 2 the radiation is largely local, i.e. 
sources at higher redshifts are absorbed and can be ne- 
glected (|Madau et al.lll999h . Let uq be the mean num- 
ber density of sources and f Vt c = Ly^l^r 2 be the spe- 
cific flux at distance r from each source. Suppose that 
a quasar is located at the origin of the volume. The to- 
tal specific flux owing to the other sources at a point 
displaced by the vector r from the quasar is 



f°u. 



other 



T N G 



i = l 



-in 



i=i 



r-n 



(16) 



where the sums are over the sources other than the 
quasar and n is the position of source i. Since we will 
eventually use this quantity for the sole purpose of cal- 
culating a photoionization rate, we need not worry that 
light rays incident at r are not necessarily parallel, i.e. 
the scalar sum of the specific fluxes contains all the rele- 
vant information. To first order, we have, averaging over 

n, 

(/^ er )(r) = ^^ £ <|r-r i |- 2 ), (17) 



■in 



where 



pR />7r p2tz 

(\r - n\- 2 ) = V- 1 / / d(\r-ri\)ddd(j)sm6 
Jo Jo Jo 

(18) 

x [l + Coefa)]. (19) 

The integral above is over a sphere of radius R centered 
on r: the |r — ri | 2 factor of the spherical volume element 
cancels the |r — i"i| -2 term that is integrated over. Using 
rf = r 2 + |r — ri| 2 — 2r|r — ri| cos 9 (for a suitable choice 
for the orientation of the coordinate system), we may 
evaluate 

aother,clust\ / \ /nn\ 
V ,G H r )' ( 20 ) 



where 



/ £ other, smoothx T 7-) 



(21) 



is the contribution of the smooth background component 
(arising from the factor 1 in eq. I18[) and 



(fu. 



other, dusts. ncL Vt G 



G 



R P 7T 



ddr-nDtMsinflCoGfri) 
(22) 

is the component (arising from the C^qq term in eq. I18p 
owing to the clustering of galaxies around the quasar. For 
both the smooth and clustering components, we define 



the photoionization rates as calculated from the average 
contributions: 



j^bkg,x 



dva(y 



other, x\ 



G 



hpv 



(23) 



Are the local fluctuations in the photoionizing flux ow- 
ing to the quasar-galaxy correlations important when at- 
tempting to measure F bkg using the proximity effect? To 
answer this question, we note that the proximity effect is 
sensitive to T bkg = T bkg ' srnooth + T bkg ' clust and consider 
the ratio: 



pfcfcg 



clust 



J^bkg , smooth 



{R . r) = lUl3i 



ri\)d9 sin6>CQ G (r 4 ) 



2R 



Note that this ratio is purely geometrical, depending on 
only the attenuation radius and the quasar-galaxy cor- 
relation function. In particular, it is independent of the 
abundance and brightness of the galaxies. The cancella- 
tions occur because we are assuming that the smooth 
background and the fluctuations are produced by the 
same sources. 

In the top panel, Figure shows T bkg ' smooth , T bkg ^ luct , 
and r® 30 as a function of distance from the quasar for 
the order-of-magnitude estimate T bkg = 10~ 12 s _1 (see 
Figure [T|) and quasars of typical luminosities at z = 3. 
For the quasar-galaxy correlation function, we adopt 
ro = 5 h _1 co moving Mpc and 7 = 1. 6, con sistent with 
the results of lAdelberger fc Steidell (l2005bD for LBGs 
at z ~ 3. The characteristic value of the R is the 
mean free path l m f p of ionizing photons, i.e. the ra- 
dius beyond which inco ming ionizing photo ns are ex- 
ponentially suppressed. iMadau et al I |l999) calculate 
lmfp ~ 33[(1 + z)/4]~ 4 - 5 proper Mpc for Lyman limit 
photons, which we take as characteristic of ionizing soft 
galactic emission. 

In the bottom panel, the ratios of Y bkg ^ luct to 
T bk g ,smooth and t qso are s h 0W n. It is seen that 
T bkgjiuct < T bkg,smooth 20 % for r > 1 Mpc and < 5% 
for r > 7 Mpc, which is much less than the factor > 2 dis- 
crepancy between proximity effect and flux decrement 
measurements of T bkg ), except at very small distances 
from the quasar, where it diverges. This divergence is 
most likely benign, as it occurs on a scale much smaller 
than the proximity region (i.e., the region where T bkg ~ 
pQSO^ wri j crl contains significant information about T bkg ) 
and in any case is an artifact of assuming that the corre- 
lation function (eq. I15p maintains its power-law behavior 
as r — > 0. In reality, Y bkg ^ luct is expected to be bounded 
above on physical grounds. Moreover, note that in the 
region where T bkg '^ uct ~ r bkg > smooth , the quasar flux al- 
ready dominates over the locally-enhanced background 
flux by a large factor. We thus conclude that clustering 
of galaxies around quasars is unlikely to significantly af- 
fect measurements of T bkg using the proximity effect, at 
least at z = 3 for a background which is dominated by 
the contribution from LBGs. 

What if quasars contribute significantly to T bkg and 
how is the situation modified at different redshifts? Con- 
sider the case of a background dominated by quasars. 
Then we must consider the quasar-quasar correlation 
functi on, but it is in fac t similar to the quasar-LGB one 
(e.g.. lCroom et alll2005[ ). Let us denote it generically by 
C(r). The ratio in equation (|24[) is then only modified by 
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Fig. 7. — Top: Comparison of the contributions to 
the total hydrogen photoionizing rate owing to the background 
( F bkg _ 10 -12 s -l. horizontal red line), the quasar (solid blue 
curves; the central thick curve corresponds to the mean typical 
quasar luminosity and the bounding thin curves indicates ranges 
of one standard deviation), and the local excess of galaxies owing 
to AGN-galaxy clustering (assuming that the background is dom- 
inated by emission from LBGs; green dashed curve) as a function 
of distance from the quasar at z = 3. Bottom: Ratios of the local 
contributions from the quasar (blue) and the clustering of galax- 
ies around the quasar (red) to the true photoionizing background. 
The horizontal dotted line shows the 5% level. 

the attenuation length R. For a galaxy-dominated back- 
ground, we took R to be the mean-free path to Lyman- 
limit photons, but the appropriate R for quasars will be 
larger owing to their harder spectra and the longer mean 
free paths for high energy quasar photons. Since C(r) is 
a strictly decreasing function of r, 



J^bkg, smooth 



< 



QQ 



smooth 



(25) 



QG 



where the subscripts QQ and QG refer to the quasar- 
and galaxy-dominated cases, respectively. The cluster- 
ing effect is therefore even less important in the quasar- 
dominated case. For the general case in which both 
galaxies and quasars contribute significantly to the back- 
ground, T bkg is a linear superposition of the galaxy and 
quasar contributions and so the total clustering contri- 
bution is again negligible. For redshifts z < 3, the atten- 
uation length is increased because the number density of 
absorbers is decreased by cosmic expansion, so the re- 
sult still holds. In the limit of large redshifts, R — > 

and T bkg,clust l T bkg,s m ooth _^ ^ g0 that ^ result must 

eventually break down. However, we are here mainly 
concerned with explaining the discrepancy between prox- 
imity effect and flux decrement measurements of T bkg at 
z < 3 (Figure [lj . The argument given so far indicates 



that clustering has a negligible effect and we are there- 
fore not compelled to model it and to pursue a further 
analysis of the issue. 

5. BIASES IN PROXIMITY EFFECT MEASUREMENTS OF 

pbfcg 

To understand why the proximity effect measurements 
of T bkg are systematically higher than the flux decrement 
measurements of the same quantity, we first formulate a 
statistical method for estimating V 9 from the proximity 
effect, with physical assumptions mimicking those made 
in previous proximity effect measurements of T bkg . In 
contrast to most previous analyses, we consider the op- 
tical depth statistics in the proximity region of quasar 
spectra instead of the differential number of absorption 
lines as a fu nction of redshift, dN /dz. Following the sem- 
inal work of Bajtlik ct al. (1988), the differential number 
of absorption lines has been used in almost all studies 
of the proximity effect to date. However, absorption 
lines arising from discrete absorbers are no longer consis- 
tent with the modern picture of the Lya forest as arising 
from smooth density fluctuations in the IGM. Moreover, 
the counting of absorption lines is a somewhat ill-defined 
and uncertain procedure, making it difficult to rigorously 
treat statistically. On the other hand, once the quasar 
continuum is estimated, the optical depth at any point 
can be simply measured as r = — In F obs / F cont 6 , where 
F obs is the observed flux and F cont is the continuum 
level. As we will see, it is also easy to treat the optical 
depth in a statistically sound framework. 

To estimate the biases introduced by the local over- 
densities around and the infall of gas toward the quasars 
(the two effects which are most likely to play an impor- 
tant role according to our discussion in £13. 2p in analyses 
which have neglected those complications, we apply our 
measurement method to mock spectra (with a known 
value of T bkg ) in which those effects arc modeled. We 
begin with some remarks on the optical depth probabil- 
ity density function (PDF) in the general Lya forest in 
£15.11 and formulate our statistical formalism in £15.21 In 
§5.31 we provide order-of-magnitude analytic estimates 
of the expected biases, before quantifying those more ac- 
curately using mock spectra in £15.51 



5.1. Lyman— a Forest Optical Depth PDF 

Before proceeding with defining a method for inferring 
T bkg from optical depth statistics in the proximity region, 
it is useful to co nsider the optical dep th PDF in the gen- 
eral Lya forest. IColes fc Jon es (1991) first used a lognor- 
mal PDF to describe the dark matter density field and 
showed that it follows plausibly from the assu mption of 
initial linea r Gaussian and velocit y fluctuations. iBi et all 
(1992) and lBi fc Davidsenl (|1997f ) applied this approach 
to baryons also and showed that the resulting Lya for- 
est matched t he observations well . Sev eral observational 
studie s, e.g. iBecker et al.l (|2006l) and iDesiacaues et al.l 
(|2007[ ). have also found that the optical depth PDF in 
the Lya forest is close to lognormal. 

Figure [5] shows normalized histograms of the optical 
depth constructed from random lines of sight through 

6 Because of the redshift-space distortions, this quantity is not 
directly proportional to HI density as in equation J9]l. It is best 
interpreted as a redshift-space effective optical depth, or taken as 
the definition of the optical depth in this work. 
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our simulation box at z = 2, 3, and 4, along with the 
corresponding lognormal distribution as estimated from 
the mean logarithm of the optical depth and its standard 
deviation. Although there are noticeable deviations - es- 

z = 2 




z = 3 




1 00 



T 



Fig. 8. — Histograms of the optical depth as constructed from 
random lines of sight in the simulation box at z = 2, 3, and 4 
(top, middle, and bottom panels). Redshift-spaccs distortions have 
been modeled as in £|4.2I The solid curves show the corresponding 
lognormal distribution as estimated from the mean logarithm of 
the optical depth and its standard deviation. 



pecially at low redshifts - of the simulated optical depth 
distribution from lognormal shape, the main features of 
the distributions are well-captured by a lognormal. The 
lognormal provides a convenient analytic form, signifi- 
cantly simplifying the discussion, and so we will use it in 
making estimates of the biases that arise when naively 
measuring T bkg from the proximity effect. Our results 
should be robust to this assumption as, in this analysis, 
the information about T bkg is contained in the scaling 
equation ([1]). 

5.2. Maximum Likelihood Method for T bkg 



We now formulate a maximum likelihood method to 
measure T bkg from optical depth statistics in the prox- 
imity regions of Lya absorption spectra. While the sta- 
tistical formalism differs from the BDO line-counting 
method, we make essentially the same physical assump- 
tions. Thus, our conclusions should also provide insight 
in the validity of studies which used the line-counting 
approach, i.e. virtually all existing proximity effect anal- 
yses. Mathematically, the explicit assumption is that 
that the Lya optical depth in the presence of the quasar, 
T prox ^ j g j-giatgj to the optical depth if the quasar were 
turned off, r°^, by 



prox 



r°ff 



l + w(Q;r)- 



(26) 



where u(Q;r) = T® so {Q;r) /T bkg {zq), T^ so (Q;r) is 
the contribution to the total photoionization rate of the 
quasar at distance r from it, and Q — {zq, Lq, cxq} 
specifies the redshift, luminosity and spectral index of 
the quasar under consideration. We call this method 
the "t— scaling" likelihood analysis. Owing to the small 
size of the proximity region, r can be taken to be the 
proper distance. Equation (|26p is just what would be 
obtained from photoionization equilibrium assuming that 
the quasar lies in a random location and neglecting all 
rcdshift-space distortions. In what follows, we simplify 
the notation and use r to denote r°" . 

Using equation (|26|) . we can relate the optical depth 
statistics in the Lya forest away from the quasar, but 
at approximately equal redshift (z w zq), to those inside 
the proximity region. For a lognormal r distribution with 
mean logarithm (lnr) and standard deviation, also in the 
logarithm, <j\ n T , we have: 



Mzq;t) 



1 



■ exp 



((lnr) - lnr) 2 



2ct, 2 



(27) 



' 27T(Tl n r T 

which translates to lognormal distribution in the prox 
imity region given by: 



/„ro-(Q ) r;T* pox ) = 



l+w(Q;r) 



x exp 



27TfTl n T TP rox 

((lnr) -\n[(l+cj{Q;r))TP rox }) 2 



2a, 2 

In -3 



(28) 



We define a correlation length, r corr , such that points 
separated by this distance can be assumed to have inde- 
pendent optical depths. Then, for a given quasar spec- 
trum, we estimate (lnr) and o~i nT using the usual unbi- 
ased estimators 



— 1 N 



(29) 



and 



°"ln-, 



N - 



1 N 



(30) 



where Tj are optical depths at points separated by r c 
outside the proximity region. 
We may now construct the likelihood function 



£[T bkg (z)]^H 
Q 




(31) 
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where /^p™* (T pro:c ) has the same parameter dependence 
as in equation (|28|) . Here, the outer product is over the 
different quasars in a sample. The inner product is over 
data points in the proximity region of quasar Q (defined 



to be between 



and 



from the quasar), again 



separated by a proper distance r corr . Each factor is pro- 
portional to the probability of obtaining a realization of 
the optical depth given the adopted model for T bka . The 
terms in the products are assumed to be evaluated at in- 
dependent points, so that the likelihood is proportional 
to the probability of obtaining all the measured opti- 
cal depths given the model. By Bayes's theorem, this is 
proportional to the probability that the assumed model 
is correct. After normalization, the likelihood function 
gives the PDF for a given model for T bkg to be correct. 
The likelihood estimator in equation [31] is suboptimal, 
since it does not make use of correlated data points. The 
estimated likelihood may therefore be wider (less con- 
straining) than could be achieved in principle. The de- 
sign of an optimal estimator is outside of the scope of this 
paper. In <J5J we discuss practical considerations arising 
with actual quasar spectra of finite resolution and signal- 
to-noise ratio, which are likely to be more important than 
the optimity of the likelihood estimator. 

5.3. Analytic Estimates of Biases 

Before applying the t— scaling likelihood to mock spec- 
tra, we make order-of-magnitude estimates for the biases 
that may be expected when the method is applied to 
spectra with quasars lying in overdense regions and with 
rcdshift-space distortions. Our estimates are based on 
the fact that the likelihood analysis is sensitive to the 
(1 + oj(Q;r)) _1 scaling assumed for r in the proximity 
region (cq. [26]) . The local overdensities and infall regions 
around quasars distort this scaling, leading to an incor- 
rect estimate of u>. Assuming that T^ so (r) is known 
(e.g., from measurements of the magnitude and spectrum 
of the quasar), this leads to an incorrect estimate of T bkg . 

5.3.1. Bias Owing to Local Overdensity 

Consider first the case where the quasar lies in an 
overdense region, but where redshift-space distortions 
are ignored. Let us denote by uj true (r) the true ra- 
tio T QSO (Q;r)/T bkB and by u a PP{r) the ratio as in- 
ferred when assuming the scaling given by equation (f26|) . 
Denote the overdensity A(r) = p(r)/(p). Then, using 
r cx 7Jhi oc A 2 ~°- 7/3 (eq. [31 [H andO, we expect the 
optical depth scaling to be approximately modified to 

T prox = / 32 ) 

Assuming the scaling of equation (|26p amounts to incor- 
rectly assuming that A = 1, from which we obtain the 
relation 

A 2 -°- 7fJ {l + u*" 16 )- 1 = (1 + w a »)-\ (33) 
and hence 

j^bkg,true (^j a PP 



Y*bkg ,app ^jtrue 



The typical distance r typ from the quasar at which in- 
formation about T bkg is encoded in the proximity effect 



is such that the photoionizing rate owing to the quasar 
is comparable to the background. We take it to satisfy 



-{r t yp) 



1, so that 

Y^kg^true 



2A 1 



0.7/3-2 



1. 



(35) 



It will be useful to express the bias in terms of the linear 
overdensity 6 = A — 1; taking the reciprocal of equation 
(|35)) . we have the bias owing to the local overdensity to 
first order in 6: 

a 1 + 3.1S(r typ ) (36) 



D 



od 



Y^kg^trz 

for (3 = 0.62, the late reionization limit. 

5.3.2. Bias Owing to Gas Infall 

Consider now the bias owing to the redshift-space dis- 
tortions caused by gas infall toward the centers of halos. 
To simplify the estimate, we ignore the local overdensity 
here. As explained in t]4.2[ a gas parcel at proper dis- 
tance r from the quasar and with velocity v\\ along the 
line of sight will, owing to the Doppler effect, appear to 
be absorbing from a distance r' = r+Ar from the quasar, 
with Ar = v\\/H. In this case, assuming the scaling of 
equation (|26[) results in the incorrect identification 



uj true (r) : 

which gives, using oc r 

infall 

Ybkg.app . r . 

Ybkg.true \r f J 



(37) 



2 the bias owing to gas 
v\\{r ty p) 



1 



Hr 



typ 



(38) 



5.4. Relation Between Overdensity and Infall Biases 

The density and velocity fields are related by mass con- 
version through the continuity equation. We thus expect 
B od and B ln ? al1 to also be simply related. In this sec- 
tion, we establish this connection. For 5 -C 1, we may 
use the linear theory continuity equation 

dvn 
dr 



88 „ 85 
h V • v sa — 

dt dt 



= o, 



(39) 



where t is proper time, and the spatial derivatives are 
proper-coordinate derivatives. For the second equality, 
we have assumed pure radial infall toward the center of 
the halo and neglected spherical-geometry corrections to 
the divergence which are important only at small radii . 
Within the spherical collapse model (|Gunn fc Gottlfl972h 
and for given initial conditions, one could write an ex- 
act relation between 6 and vu from equation (|39j) . How- 
ever, a simple expression for this relation is not available. 
We thus simply proceed with an order-of-magnitude es- 
timate, expressing derivatives in terms of characteristic 
scales at r^ypi 

M~iai, wo, 

*// r typ 

where the free-fall time for a spherically symmetric mass 
configuration of interior mean density p is given by 



1 



3tt 



In fact the interior mean density exceeds the mean cos- 
mic density around a halo, for otherwise it would not 
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have collapsed. In spite of this, we use the cosmic mean 
density in the denominator of equation (|4Tj) , since using 
the full overdensity would result only in a second order 
correction to \S\/tff (eq. |40|) . 

Solving for |t>|| | in equation (I40j) and substituting in 
equation (|38|) for the bias owing to infall, we obtain 

B infalt M 1 + 1 , 57 fS(r typ ). (42) 

The factor /, of order unity, has been introduced to al- 
low for the approximations made. In deriving equation 
(|42p , we have made use of the Friedmann equation which 
expresses the Hubble parameter in terms of Gp, 

H = ^ (43, 

(again neglecting the cosmological constant, which is 
valid at the redshifts of interest), and of the fact that 
U|| < 0. Comparing the expressions for B od and B m ' 
(eq. 151)1 and |4*2"|) in terms of 5(r typ ), we see that both 
biases scale similarly with 5(rt yp ). The total bias can be 
estimated at B tot » B od x B ir ^ al1 « 1 + 4.67S(r typ ) (for 
/ = 1). At z = 3, r typ — 5 proper Mpc for a quasar of 
typical luminosity (Figure[7]) and (S(r typ )) = 0.1, with a 
large point-to-point dispersion of as — 2. One may thus 
reasonably choose 5 — 0.32, obtaining B tot w 2.5, and 
quantitatively make sense of the results of the detailed 
calculations of the next section. 

5.5. Biased T bkg Likelihoods 

To more accurately estimate the biases that could be 
present in proximity effect measurements of T bkg , we per- 
form the full likelihood analysis of S 35.2I on mock spectra 
with known T bkg and quasars which reside at the cen- 
ters of dark matter halos with masses in the fiducial 
range 3.0 ± 1.6 hr 1 x 10 12 M Q . We consider both the 
case in which only the local overdensity is modeled in the 
mock spectra and the realistic case in which the redshift- 
space distortions owing to gas infall and thermal broad- 
ening are also simulated. For the likelihood analysis, 
we choose the parameters {r min , r max , r corr ) = (1,20, 1) 
proper Mpc so as to include the region where the proxim- 
ity effect is important for typical quasars (e.g., Figure[3|). 
The correlation length is selected based on measurements 
of the flux correlation function and power spectrum in 
the Lya forest, which i ndicate small correlations on ap- 
proximately this scale (jMcDonald et al.ll2000L l2006f ). In 
Figure O we show the ratio of the correlation function 
of lnr to its variance, e, computed from the simulation 
box (at random locations) at z = 2, 3, and4. The ratio 
is < 15% at z = 4 and 3, and < 30% at z = 2 for r' > 1 
proper Mpc. In Appendix IDl we develop a toy model to 
estimate the effect of correlations on estimates of the like- 
lihood that ignore them, as we calculate. We show there 
that the error on the width of estimated likelihood, with 
respect to the true likelihood (which would take into ac- 
count the correlations), is a factor w (1 — e/2) (for e«l, 
neglecting correlations between nonadjacent points), in- 
dependent of the total number of (independent) spectra 
used. 

The resulting likelihood functions, computed on mock 
data sets of 100 typical spectra at z = 2, 3, and 4, 
are shown in Figure 1101 In each case, the maximum- 
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Fig. 9. — Ratio of the correlation function of In r to its variance 
computed from the simulation at z = 2, 3, and 4 (top, middle, and 
bottom). The ratio is < 15% at z = 2 and 3, and < 30% at z = 4 
for r' > 1 proper Mpc. 

likelihood T bkg overestimates the true T bkg , as quali- 
tatively expected from the analytic estimates of £)5.3I 
Moreover, the total bias in the central (z = 3) panel 
agrees quantitatively with the analytic estimate for that 
rcdshift. The "typical" 5(rt yp ) used for this estimate was 
in fact adjusted to quantitatively reproduce the order-of- 
unity bias shown here; the essential point is that the bias 
can be understood to order of magnitude. Most impor- 
tantly, the above shows that the fact that quasars lie in 
massive host halos, associated with local matter over- 
densities and gas infall, alone could bias proximity effect 
measurements of T bkg high by a factor ~ 2.5 at z w 3. 
These effects should thus be taken into account in future 
proximity effect analyses. 

6. UNBIASED MONTE CARLO LIKELIHOOD FOR F bk 9 
6.1. Method 

The likelihood analysis in the previous section is bi- 
ased because the scaling of equation ([26]) ignores the lo- 
cal matter overdensity and redshift-space distortions in- 
duced by quasar host halos. The likelihood method can 
however be modified to take these effects into account 
and yield an unbiased measurement of T bkg . To do so, 
it suffices to replace expression ([28]) for /p P r OJ; (r proa: ) in 
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Fig. 10.— Likelihoods for T bk 3 assuming the mapping 
T prox _ t(\ + w) — 1 computed on samples of 100 mock spectra 
at z = 2, 3, and 4 (top, middle, and bottom) for quasars of typical 
luminosity lying at the centers of mass of dark matter halos of mass 
in the range 3.0 ± 1.6 X 10 12 h' 1 M . The dashed curves show the 
cases where the overdensities around quasars have been modeled 
in the mock spectra, but with no redshift-spacc distortions. The 
solid curves show the realistic case where the mock spectra include 
both the overdensities and the redshift-spacc distortions owing to 
gas infall and thermal broadening. The red vertical dotted line in- 
dicates the known T hka in the simulation (10 -12 s _1 ). Deviations 
of the solid likelihood peaks from this value are estimates of the 
bias which is introduced in analyses, e.g. BDO-type, which ignore 
overdensities and redshift-space distortions. At z = 3, this total 
bias is ~ 2.5, in quantitative agreement with the discrepancies be- 
tween existing proximity effect and flux decrement measurements 
of T bkg (c.f. Figure [TJ. The overdensities and redshift-space dis- 
tortions are seen to contribute approximately equally to the total 
bias. 

the likelihood function (|3T|) by PDFs numerically con- 
structed from mock spectra with quasars placed in halos 
of the correct mass and with redshift-space distortions 
properly modeled. The PDFs constructed in this way 
will match those expected in reality for the correct value 
of T bkg . The likelihood method will in this case be un- 
biased. In this section, we demonstrate how this can be 



achieved. Because the PDFs used in this method are 
generated by Monte Carlo, we refer to this analysis as 
the "Monte Carlo" (or MC) likelihood. 

The first step is to construct optical depth PDFs from 
the mock spectra as a function of redshift-space proper 
distance r' from the quasar. We assume here that the 
mass range of halos hosting quasars, [M min7 M max ], is 
known (e.g., from clustering measurements). For any 
given set of quasar parameters {zq, Lq, cxq}, we gener- 
ate mock spectra for quasars at redshift zq in halos in the 
mass range [M m i n , M max ], with spectral energy distribu- 
tion parameterized by Lq and otQ 7 . We do so for each 
value of T bkg at which we wish to evaluate the likelihood. 
In each case, 10 lines of sight are drawn from each of 100 
halos randomly chosen in the simulation box. This gives 
a total of 1000 lines of sight which should capture well 
the variance between halos of similar masses. 

Since r is nearly lognormally distributed, it is best to 
first work in logarithmic space and then calculate the 
PDF of r itself from the transformation 



/t(t) = -/in?(hir). 

T 



(44) 



For r' e {r m in,r min + r corr , ...,r rnax }, the optical depths 
at redshift-space distance r' from the quasar are tab- 
ulated for the 10 4 mock spectra. Using the first four 
moments (mean, standard deviation, skewness, and kur- 
tosis), we obtain a very good approximate analytic ex- 
pression for /i n ^(lnr) using the Edgeworth expansion, 
described in Appendix [Cl Figure fTTI shows a comparison 
of the Edgeworth approximation to the data from which 
it is computed for a representative example of the PDF 
of the optical depth in the proximity regions of quasars 
of typical luminosity lying in halos in the fiducial mass 
range at z — 3. After grids of such PDFs are constructed, 
likelihood analysis is performed as before, simply using 
these instead of the incorrectly scaled PDFs of equation 
(|28p . The Edgeworth expansion accurately approximates 
the PDF of the data, as illutrated in Figure fTTI For 10 4 
data points at redshift-space distance r' — 0.5 proper 
Mpc from quasars of typical luminosity in halos in the 
fiducial mass range 3.0 ± 1.6 x 10 12 /i -1 M Q at z = 3, the 
reduced \ 2 between the data and its Edgeworth approx- 
imation is x 2 = 0.87. In 8 37.31 we compute likelihoods 
using this approximation on mock spectra and recover 
the correct halo mass, thus confirming that the approxi- 
mation does not introduce significant biases. 

6.2. Test of the Monte Carlo Likelihood 

To test the MC likelihood method, we proceed as in 
^5. 51 for the r— scaling method: we again apply the anal- 
ysis to mock data sets with known T bkg and see if the 
correct value is recovered. We test the MC likelihood 
on spectra with both the local overdensity and redshift- 
space distortions simulated as in ^5.51 The results are 
shown in Figure [T"2l for 100 spectra at z = 2, 3, and 4. 
As expected, the correct value of T hkg is accurately re- 
covered. 

6.3. Relationship to Flux Decrement Method 

7 In fact, it is not necessary to know Lq and otq separately; it 
is sufficient to give the integral quantity T^ so at some distance 
from the quasar. 
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Fig. 11. — Example of the Edgeworth approximation to the opti- 
cal depth PDF. The normalized histogram shows 10 4 data points at 
redshift-space distance r' = 0.5 proper Mpc from quasars of typical 
luminosity in halos in the fiducial mass range 3.0 ± 1.6 X 10 ft. -1 
Mq at z = 3 in the simulation. The Edgeworth expansion, cal- 
culated from the first four moments of the data, is shown by the 
solid curve. Neglecting bins with less 500 data points (which re- 
duce the x 2 even further since the tails are best approximated) 
and accounting for the four degrees of freedom estimated from the 
data, the reduced \ 2 = 0.87, i.e. the Edgeworth approximation is 
a very good fit to the data. 

Unlike the r— scaling method, which is sensitive only to 
the scaling of the optical depth in the proximity region 
with respect to far away from the quasar 8 , the Monte 
Carlo likelihood method is sensitive to the absolute level 
of absorption, both close to and far from the quasar. 
This is because the MC PDFs at each distance from the 
quasar depend on this absolute level. In fact, in the 
limit where we ignore points in the proximity regions of 
the quasars and extend the analysis to greater distances 
(r prox <§; r rn i n < r max ), the method essentially reduces 
to the flux decrement approach. This proximity analysis 
thus, although in principle unbiased, no longer presents 
clear advantages over the flux decrement method, such 
as being independent of Qb- 

7. MASS OF QUASAR HOST HALOS FROM THE 
PROXIMITY EFFECT 

In the MC likelihood method of the previous section, 
we assumed that M^m was known and maximized the 
likelihood function with respect to Y hk9 . In this section, 
we assume we have an independent measure of T bkg , com- 
ing either from the flux decrement method or any other 
reliable measurement. We then proceed as in the previ- 
ous section, but instead parameterize the likelihood func- 
tion by the mass of quasar host halos and try to constrain 
it from the data. In what follows, we test this idea on 
mock spectra. 

7.1. Method 

PDFs are numerically constructed as in £16.11 but this 
time T bkg is fixed and the mass range [M min , M max ] is 
varied. For each mass range, we again draw 10 lines 
of sight from 100 randomly selected halos. Examples 
of resulting PDFs for different redshift-space distances 
from the quasars and host halo masses at z = 3 are 

8 The optical depth statistics away from each quasar are esti- 
mated from the data in the analysis. 
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Fig. 12. — Likelihoods for F bk9 computed from Monte Carlo 
optical depth PDFs on samples of 100 mock spectra at z = 
2, 3, and 4 (top, middle, and bottom) for quasars of typical lu- 
minosity lying at the centers of mass of dark matter halos of mass 
in the range 3.0 ± 1.6 X 10 12 h -1 Mq, with redshift-space distor- 
tions and overdensities fully modeled. As expected, the simulated 
r 9 (10 — 12 s — 1 , indicated by the vertical red dotted lines) is ac- 
curately recovered. 

shown in Figure [6l The number of halos employed in 
constructing the PDFs imposes limitations on the width 
of the mass ranges that can be used, since the sim- 
ulation box contains a finite number of halos in any 
given mass interval. Moreover, the width of the opti- 
cal depth PDF for any given mass range depends on the 
width of the range. We thus consider mass ranges of 
fixed width in the logarithm, equal to that of the fidu- 
cial range 3.0 ± 1.6 h^ 1 x 10 12 Mq, covering the range 
of halo masses represented in the simulation and requir- 
ing that each mass range contains at least 100 halos. The 
mass ranges are indexed by the central mass in logarith- 
mic space, Mdm = exp [0.5(lnM m . m + \nM max )]. This 
choice is consistent with the exponentia l nature of the 
mass function (e.g.. [Jenkins et aT1l2001l ) and avoids an 
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excessive pile-up of halos to the left of the mass index. 

7.2. Location of Mass Information 

In the case of r bkg , the proximity effect optical depth 
statistics are expected to convey significant information 
in the entire region where and T bk 3 are compa- 

rable. This region can extend to > 20 proper Mpc for 
bright quasars at z = 3 (Figure [7]). However, the halo 
overdensity and infall velocity (in units of Hr) profiles 
drop greatly on scales < 1 proper Mpc (Figures [4] and 
[5|) . The information about the dark matter halos is con- 
tained in these signatures and it is a priori unclear how 
far away from a quasar one can go and still learn about 
its host halo. We address this question in this section, 
before proceeding with testing the likelihood method for 
recovering Mjjm m the next. 

To quantify the mass information content as a func- 
tion of distance from a quasar, we use the Kolmogorov- 
Smirnov (K-S) test (|Press et al J 1 19921) . This statistical 
test compares two random samples and quantifies the sig- 
nificance that the realized values were drawn from differ- 
ent distributions via the K-S P-value, Pks- If Pks < a i 
then a deviation as large as observed between the two 
random samples would have probability < a to occur 
by chance if the two samples were drawn from the same 
distributions. 

Here, we fix {zq,Lq,(xq\, the known host halo mass, 
Mpy, and the number of spectra Nq in the mock data 
set and consider points at increasing distance from the 
quasar. At each redshift-space distance distance r', we 
vary M^m- For each trial value, we compare the sample 
of Nq optical depths at distance r' in the mock spectra 
to a sample of 1000 optical depths for M l £ff. If the 
resulting Pk-s is small, then the optical depth statistics 
can significantly distinguish between Mum and M l £^. 

Figure [TBI shows the results of this K-S analysis at z = 
2, 3, and 4. Because the realized Pk~s values vary 
with the random samples, we show the mean value (dot) 
and the standard deviation (error bar) for each mass, 
computed from ensembles of 50 sets of mock spectra. 
At each redshift, the mass information appears mostly 
contained at points r' < 1 proper Mpc (note that this is 
still ~ 20r V i r - see eq. [TO]) , although this conclusion is 
somewhat dependent on the range of masses probed by 
our simulation. Even if Pk-s drops only marginally for 
masses differing from M^Vf for the masses probed, the 
analysis would presumably significantly rule out much 
smaller or much larger masses. This is supported by 
Figure [HI which shows that the optical depth PDFs for 
halos located in random locations (effectively, Mdm = 0) 
strongly differ from those of the massive halos resolved 
in the simulation. 

7.3. Test of the Halo Mass Likelihood 

Figure [JJ] shows likelihoods for Mom computed on 
samples of 100 mock spectra for typical quasars lying in 
halos in the fiducial mass range 3.0 ± 1.6 x 10 12 h~ x M . 
For these likelihood calculations, we use {r min ,r max ) = 
(0.1,5.1) proper Mpc and vary the correlation length, 
r corr , assumed in the computation of the likelihoods from 
1.0 proper Mpc to 0.5 and 0.1 proper Mpc. Better ap- 
parent constraints are obtained with smaller r corr , as 
more data points are used. However, the width of the 



estimated likelihoods is underestimated by an increasing 
factor as more correlated points are assumed indepen- 
dent (see Appendix lD|l ). The correct mass range is re- 
covered with a precision comparable to the width of the 



true mass interval in the case 



1.0 proper Mpc, 



in which case the correlations are small (see Figure EJ . 
This shows that, in principle, it is possible to measure 
quasar host halo masses from the proximity effect with 
a data set of modest size. In fact, a data set of 100 
spectra is two orders of magnitude smaller than those 
used in state - of-the -art clustering analyses such as the 
iCroom et all (120051) one. It is true, however, that the 
clustering analyses have relatively low spectroscopic re- 
quirements. As we discuss in $51 important challenges 
must be addressed before the level of precision estimated 
here can be achieved in practice. In fact, realistic spectra 
have finite resolution and signal-to-noise ratio. Moreover, 
the continuum level is not given and must be estimated, 
and systematic quasar redshifts are often not precisely 
known. 

8. PRACTICAL CONSIDERATIONS 

Throughout this paper, we have optimistically as- 
sumed ideal data sets. In particular, we have assumed 
spectra with infinite resolution and signal-to-noise ra- 
tio {SN), and perfect redshift determinations for the 
quasars. In this section, we briefly discuss the compli- 
cations associated with realistic data. As we will argue, 
if the continuum flux and redshift of each quasar are 
known accurately, finite resolution and signal to noise 
can be accounted for and do not present serious prob- 
lems. However, our ability to estimate a quasar's con- 
tinuum flux is limited by the resolution and noise in the 
spectra and accurate quasar systemic redshifts are chal- 
lenging to obtain. These difficulties are likely to be the 
most important in applying the method presented in this 
paper to measure quasar host halo masses to actual data. 

8.1. Finite Resolution and Signal-to-Noise Ratio 

Let us first suppose that the continuum flux and red- 
shift of each quasar are given, and consider the effects of 
finite resolution and signal-to-noise ratio. 

Finite resolution alters the optical depth PDF by 
smoothing out the small-scale fluctuations. This can in 
principle be simply accounted for by smoothing the mock 
spectra from which the model PDF are constructed to 
the same resolution as the actual data. 

A finite signal-to-noise ratio requires more spectra to 
detect intrinsic fluctuations in the IGM of a fixed ampli- 
tude. The total variance on the pointwise transmission 
F = e~ T is a sum of the variance due to intrinsic IGM 
fluctuations, erf,, and the variance in the observed flux 
due to the finite SN, o-g N = l/SN 2 . Thus, the error on 
the standard estimator for the mean F estimated from 
N spectra is 



<?{F) 



'SN 



N 



(45) 



The PDF shown on Figure [6] suggests that to distinguish 
between different halo masses requires measuring mean 
F at the percent level. Thus, one needs approximately 



N 



l/SN 2 



0.01 2 



(46) 
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Fig. 13. — Kolmogorov-Smirnov P-value for the significance of the difference between the true (Mom = 3.0 ± 1.6 h~ 1 X 10 12 Mq) and 
trial optical depth PDFs as a function of the quasar host dark matter halo mass range, at increasing apparent distances from the quasar. 
Each trial mass range has the same logarithmic width as the true one (indicated by the vertical red lines) and is labeled by the central 
mass in logarithmic space. In each case, the P-value is averaged over 50 samples of 100 typical quasar spectra at z = 4 and the vertical 
error bars indicate the standard deviation. At z = 4, the K-S points do not appear to fall on the right side of the true mass range simply 
because there are too few massive halos in the simulation box to probe mass ranges much larger than the fiducial one. 
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Fig. 14. — Likelihoods for Mom computed on samples of 100 
mock spectra at z = 2, 3, and 4 (top, middle, and bottom) for 
quasars of typical luminosity lying at the centers of mass of dark 
matter halos of mass in the range 3.0 ± 1.6 X 10 12 h _1 Mq (indi- 
cated by the vertical red lines). T bk9 = 10~ 12 s" 1 is assumed to be 
known exactly. The correlation length, r corr , assumed in the com- 
putation of the likelihoods is 1.0 proper Mpc (thick solid curves), 
0.5 proper Mpc (dashed curves), and 0.1 proper Mpc (thin solid 
cruves). Better apparent constraints are obtained with smaller 
r corr , as more data points are used. However, the width of the 
estimated likelihoods is underestimated by an increasing factor as 
more correlated points are assumed independent (Appendix iDl) , 
The correct mass range is recovered with a precision comparable 
to the width of the true mass interval in the case r corr = 1.0 proper 
Mpc, in which case the correlations are small (Figure [9j. The left- 
most value on the horizontal axis corresponds to the smallest-mass 
halos resolved in our simulation. The 2 = 4 likelihood is truncated 
at high masses simply because the simulation box contains too few 
halos with mass much larger than the fiducial range to evaluate 
the likelihood at those points. The likelihoods on this figure were 
normalized to peak at 0.01. 

spectra to distinguish between different realistic halo 
masses. At z ~ 3, typical intrinsic fluctuations have 
up Ki 0.2, thus requiring SN w 5 to dominate over the 
noise. This, by itself, is a modest requirement. In that 
case, of order 10 3 spectra are necessary to distinguish 
between different halo masses. Of course, valuable mass 
information may also be encoded in the higher moments 



of the transmission, in particular in the incidence of large 
(~ 1) optical depths. Several data points from each spec- 
trum can also in principle be used (e.g., as in the like- 
lihood method described in £)5.2p . alleviating the data 
requirements. 

8.2. Continuum Fitting 

An indirect, more severe effect of finite resolution and 
SN is to make accurate continuum fitting more difficult. 
In the case of finite SN, the "peaks" in the observed 
spectra which are usually fitted to estimate the continua 
in general do not exactly correspond to a transmission of 
unity, owing to the noise contribution. In the case of fi- 
nite resolution, the peaks are smoothed and their height 
may therefore be underestimated. There is also a funda- 
mental limit to how precisely quasar spectra can be con- 
tinuum fitted, since peaks of unity transmission become 
increasingly rare, if extant at all, at high (> 4) redshifts. 
Continuum errors are of special concern because contin- 
uum fitting is a somewhat subjective procedure and they 
are likely to be affected by systematics to some extent. It 
may thus be fruitful to devise more sophisticated meth- 
ods which by pass continuum est imation, along the lines 
suggested bv iLidz et al. (2006a) for the analysis of the 
matter power spectrum. 

8.3. Redshift Errors 

In practice, the uncertainties on quasar redshifts are 
substantial. The redshifts determined from broad emis- 
sion lines, which are affected by inflows and outflows of 
material, can differ a lot from the actual systemic red- 
shift of the quasar (iGaskelll 11982b iTvtler fc Fanl Il992t 
IVanden Berk et al.ll2001t IRichards et al.ll2002t ). For ex- 
ample, IRichards et~afl (|2002ft find that the broad C IV 
emission line has a median blueshift of 824 km s _1 with 
respect to the narrow Mg II line, with a dispersion about 
the median of 511 km s . At z = 3, this corresponds 
to a physical scale of a few Mpc, comparable to the ra- 
dius which is substantially affected by the host halo of the 
quasar. Narrow emission lines, such as [O III] 5007 A and 
Mg II 2798 A, which are associated with the host galaxy, 
can provide systemic redshifts to a precision w 50 — 300 
km s _1 (J. Hennawi, private communication; see also 
Vanden Berk et al. 2001 and Richards et al. 2002a). At 
z > 2, these narrow lines however require measurements 
extending in the near infrared. The importance of ac- 
curate redshifts has been highlig hted in the con t ext o f 
the transverse proximity effect by IHennawi "eTall (pOOl ) 
and IHennawi fe Prochaskal (|2006f ). Methods to obtain 
robust systemi c quas ar re dshifts, like t hat d eveloped by 
IHennawi et alJ (|2006h and lShen et al.l (|2007h . are there- 
fore likely to be necessary for accurate proximity effect 
work. 

9. COMPARISON WITH OTHER WORK 

lLoeb fc Eisensteir] (|1995t ) first estimated the bias in 
proximity effect measurements of T bkg arising from their 
overdense environments. They found that the bias could 
be up to a factor ~ 3, consistent with our results. 
Their analytical analysis, very different from ours, was 
based on counting the Lya absorption lines Doppler 
shifted beyond the q uasar redshift owing to gas infall. 
iRollinde et alJ (|2005l ) also proposed using the proxim- 
ity effect to study the density structure around quasars. 
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They however incorrectly neglected the effects of gas 
infall toward the quasars, which we have shown to be 
very important. Instead of using dark matter halos from 
their simulation, they simply modeled the density en- 
hancement around quasars with an analytic function of 
distance multiplying the density field. This approach 
neglects the fact that the density fluctuations are not 
necessarily linearly scaled as a quasar lying in a halo is 
approached, e.g. the fluctuations and their standard de- 
viation need not be scaled by the same factor. As Figure 
[4] shows, the ratio of (6) to the pointwise standard de- 
viation of S varies strongly with distance, violating the 
authors' assumption. This approach also has the draw- 
back of not relating the recover ed density profiles to the 
mass of the qua sar host halos. iGuimaraes et"al] (|2007f ) 
have applied the iRollinde et all (|2005f ) model on a sam- 
ple of 45 quasars at z > 4 and measured an average halo 
profile extending to > 40 proper Mpc, an order of magni- 
tude farther than expected from simulations of structure 
formation (c.f., Figure S]), casting doubt on the valid- 
ity of their approach. This anomalous result is likely an 
artifact of an inaccurate modeling of the ionizing effect 
of the quasars (the standard proximity effect), which is 
dominant and must be subtracted in order to measure 
the host halo mass. For instance, these authors neglect 
redshift-space distortions altogether and their analytical 
model is partly heuristic and mathematically inexact. 

10. CONCLUSION 

In this paper, we have shown that the Lya optical 
depth statistics in the proximity regions of quasar spec- 
tra are significantly affected by the massive halos which 
they are expected to occupy. In general, the mean optical 
depth increases with halo mass. We have then quantified 
the biases induced in line-of-sight proximity effect mea- 
surements of the background photoionizing rate T bkg that 
neglect the effects of quasar host halos assuming idea 
data, i.e. that the quasar spectra have infinite resolution 
and signal-to-noise ratio and perfectly known continuum 
level and systemic redshift. The local matter overden- 
sity around and the infall of gas toward the quasars con- 
tribute approximately equally to the total upward bias. 
At z w 3, where most proximity effect measurements of 
T bkg have been made, the proximity effect T bkg bias for 
host halo mas ses in the range 3.0 ± 1.6 hr 1 x 10 12 M , 
as inferred by ICroom et al.l (|2005f ) from clustering mea- 
surements, could be f« 2.5, enough to bring in agree- 
ment the existing proximity effect and flux decrement 
measurements. The existing proximity effect measure- 
ments were however made on data of finite quality, with 
continua and redshifts estimated from the data. These 
observational difficulties may also affect the validity of 
those proximity effect measurements, beyond the effects 
investigated here. The fact that quasars lying in over- 
dense regions of the Universe introduces very significant 
biases is nonetheless a robust conclusion of our study, 
and should be taken into account in future analyses. 

The clustering of galaxies and other AGN around 
proximity effect quasars has a small effect on the local 
magnitude of the background ionizing flux, at least at 
z < 3, and therefore is not expected to significantly bias 
T bkg measurements. 



By constructing optical depth PDFs by Monte Carlo 
from realistic mock spectra with quasars in host halos 
of prescribed mass, one can perform a likelihood anal- 
ysis for T bkg which is unbiased. However, this method 
is sensitive to the absolute level of Lya absorption, both 
close to and far from the quasars. In the limit where only 
points far from the quasars are considered in the analysis, 
it essentially reduces to the flux decrement method. As 
such, the method does not possess the main advantage of 
"true" proximity effect analyses (which are only sensitive 
to the change in absorption statistics near the quasars) 
that they are relatively free of cosmological parameter 
assumptions. In particular, this method requires knowl- 
edge of fib and so a T bkg value obtained with it cannot be 
used in conjunction with the flux decrement to constrain 
the baryon density. 

Of perhaps greatest interest, we have demonstrated 
how, given a measurement of T bkg (e.g., from the flux 
decrement), the proximity effect analysis can in princi- 
ple be inverted to probe the environments of quasars. In 
particular, we have shown that the masses of dark matter 
halos hosting quasars could be measured using the opti- 
cal depth statistics in the proximity regions of quasars. 

As mentioned, we have in this work side-stepped a 
number of observational issues. For instance, we have 
assumed that the continuum flux and redshift of each 
quasar were known exactly; in practice, these are plagued 
by uncertainties. Moreover, real spectra have finite res- 
olution and signal-to-noise ratio. From our discussion 
of these complications, we have concluded that accu- 
rate continuum fitting and quasar redshift determina- 
tion are likely to be the most important challenges to 
using our method to measure halo masses. Further 
work in developing techniques to either improve or by- 
pass these measurements would thus be highly desir- 
able. Our theoretical picture is also simplified in some 
respect s. For instance , we h ave assumed that the low- 
density iHui fe; Gnedinl (|1997f ) equation of state for the 
IGM holds all the way to the centers of the quasar 
host halos, which may not be the case. In particular, 
if helium is doubl y-ionized by quasars at z ~ 3 (e.g. 
iMadau et al.lll999h . the temperature should be enhanced 
close to quasar sources. Whether such thermal effects are 
important in the vicinity of quasars could be investigated 
by studying how the small-scale power spectrum in the 
Lya forest v aries as the quasar s are approached, as was 
proposed bv lZaldarriagal |2002) for the general forest. 
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TABLE 2 

Existing Flux Decrement and Proximity Effect Measurements of the Background 

Photoionizing Flux 



Flux Decrement 


Redshift 


J-21 T bk3 
ergs s -1 cm -2 Hz" 1 sr _1 10 -12 s _1 


Reference 



1.95 

3 
2 
3 
4 

1.9 
2.75 

3.0 
3.89 

4.0 

5.0 

5.5 

6.0 

2.4 
3 

3.9 
4.5 
4.93 
5.2 

4.72 > 0.04 a 

1.5,2.5 
2.5,3.5 

3.5,4.5 



Proximity Effect 



[1.7,3.8] 


n 7 +U.34 
u -'-0.44 


i n+l.^c,d 
1 - a -1.0 


Scott et al. (20001 


[2.0,4.5] 


1 0+ ' 5 
1,u -0.3 


2 6+ 13 


Cooke et al. ( 19971 


[1.7,3.7] 


0.6 


1.6* 


Srianand & Khare ( 19961 


[1.7,4.1] 


0.5 ± 0.1 


1.3 ±0.3* 


Gialloneo et al. (19961 


3.66 


0.5° 


1.3* 


Cristiani et al. (1995) 


m 4.2 


as 0.1 - 0.3 f 


0.3-0.8* 


Willieer et al. (19941 


« 0.5 


0.006s 


0.02* 


Kulkarni & Fall (19931 


[1.7,3.8] 


1+ 2 

-0.7 


2 6+ 4 ' 2 * 


Lu et al. (19911 


[1.7,3.8] 


1+3.2 
-0.7 


2 6+ 8 3 * 


Baitlik et al. (19881 


3.75 


>3 h 


> 7.8* 


Carswell et al. (19871 



a Assuming a spectral index of 0.7 for the background flux. b For their ACDM cosmology. c The authors 
claim that the presence of lines on the saturated part of the curve of growth could cause their estimate to 
be overestimated by a factor 2— 3. d Contrary to other proximity effect analyses, this value does not assume 
a spectral index for the background; the authors repeated their analysis solving directly for T bkg . e From 
a single quasar, QSO 0055— 269. f From a single quasar, QSO BR 1033— 0327. g The uncertainties are 
large; at the lc could be lower by a factor of 3 or higher by a factor of 6. h From a single quasar, PKS 
2000—330. Calculated from J UL assuming a spectral index for the background flux equal to the typical 
value for radio-quiet quasars, a = 1.57 (Tclfcr ct al. 2002). These authors assumed that the background 
flux had the same spectral index as the quasars shortward of the Lyman limit in their analyses, from which 
they inferred Jv Ly - 

APPENDIX 

A. EXISTING FLUX DECREMENT AND PROXIMITY EFFECT MEASUREMENTS OF THE BACKGROUND 

PHOTOIONIZING FLUX 

In this appendix, we tabulate the existing measurements of T bkg from the flux decrement and proximity effect 
methods. This serves as a quantitative complement to Figure [TJ 



1.33 
1.3 + 0.1 

i n+0.8 

^-o.s 
0.9 + 0.3 

1 o+ - 5 

iu -0.3 

1.44 + 0.11 

o.86tg;2J 

a+0.14 
5 -0.12 
2+0.08 
-0.07 
,+0.06 
-0.05 
,+0.07 
-0.09 
7+0.06 
-0.05 
< 0.14 



O.i 

0.68:! 
0.43j 
0.3lj 
0.37 4 



0.698 + 0.096 
0.518 + 0.083 
0.380 + 0.04 
0.21 + 0.04 
0.13 + 0.03 
0.16 + 0.04 
>0.129 
0.890 
0.698 
0.618 




Jena ct al 
Kirkm an et al 
Bolton et al 



Tvtler et al. (2004a) 
Meiksin k. White (20041 



McDonald fc Miralda-Escudc (2001 



Soneaila et al. (1999) 
Rauch et al. (1997) b 



B. ACCURATE EXPRESSIONS 

In some of our calculations, we have used approximate expressions for the physical properties of the IGM in order to 
improve computational efficiency. These approximations are not expected to have any significant effect on our results, 
as they were applied consistently However, when the methods presented in this work are to be applied to real spectra, 
it may be important to use more accurate expressions, in order to be consistent with the physical processes as they 
occur in the Universe. Some len gthy formulae were al so omitted in the main body in order to improve readability. In 
this Appendix, based on that of iHui fc Gnedinl (|1997l ) . we collect explicit accurate expressions missing in the text. 
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Hydrogen photoionization cross section: 



a(E) = 5.475 x 10~ 14 cm 2 (£;/0.4298 eV - 1) 



(£/0.4298 eV) 



-4.018 



(1 + v^/14.13 eV) 2 - 963 

(Verner et al. 1996; accurate to 10% from the ionization thresholds to 5 keV). 
Hydrogen recombination coefficient: 

(2 x 157,807 K/T) 1 - 503 



R(T) = 1.269 x 10~ 13 cm" 3 s~ J 



[1.0 + (2 x 157, 807 K/0.522T) - 470 ] 1 - 923 
(fit by Hui & Gnedin 1997 to Ferland et al. 1992 data; accurate to 2% from 3 to 10 9 K). 



To coefficient in IGM equation of state [SJ 

,2 



rpl.7 
2 



reion rp 

n *> 



17 1 7 / n 19 

1 ' ' '»-- >» » . .. ' - "reion | q 



_„"[3/2+(a-0.25)/a] / -, 

1.9 1 



where 



D 



Pb 



T r eion = 24, 000 K, and we take z reion = 1/(1 + a reion ) = 10. 
IGM equation of state exponent (3: 

a- 



reton- LL \- L reion } t 



l 

L7 



1 - 



T \ 17 

''reion 1 reion \ a r 



1.7 



where 



1.9 2.9 



1 - 



,1-9 



+ (a 



- ion^reion) 



1.7 
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(B3) 
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(B5) 
(B6) 



C. THE EDGEWORTH EXPANSION 



The Edgeworth expansion (e.g., lJuszkiewicz et allll995t iBlinnikov fc Moessnerl [l998h is an expansion of a nearly 
Gaussian distribution in terms of its moments. Since the Lya optical depth statistics are nearly lognormal, their PDF 
is well approximated by an Edgeworth expansion in the first four moments: 



/in? (In r) 



"In-, 



1 + ■^skcw(lnr)i? 3 (r7) + —kurt(ln r)F 4 (r?) 



10 
6T 



[skew(lnr)] 2 ff 6 fa) Ufa) 



(CI) 



wher e skew(lnr) and kurt(lnr) ar e the skewness and kurtosis of /in?, respectively, the H n (x) are Hermite polynomials 
(e.g.. lAbramowitz fc Stegunlll965D . and 



(j>{rf) 



2tt 



exp 



'L 

2 



is a standard Gaussian PDF. Unbiased estimators of the skewness and kurtosis are given by 



skew (In r) 



N 



and 



kurt(ln r) 



2 Ef=i(lnr,-(lnr)) 2 ] 3 / 2 
(N + l)N(N-l) Eti(lnT 4 -OnT)) 4 (iV-1) 2 



(C2) 



(C3) 



(C4) 



(N-2)(N-3) ELann-anr)) 2 ] 2 (N-2)(N-3) 
The Edgeworth expansion is used to approximate PDFs of In t in sections [6] and [7] 

D. EFFECT OF CORRELATIONS ON THE LIKELIHOOD FUNCTION 

In this section, we present a toy model to understand the effect of correlations on the likelihood function. Specifically, 
we seek to quantity the error that is made if neighboring data points are correlated, but a likelihood estimator 
assuming independence is used. For example, in this paper we pretended that points separated by a "correlation 
length" were perfectly independent (e.g., in §5^2§. We would like to know how well our estimate of the likelihood 
obtained in this way represents the true likelihood. 

Consider N independent pairs of Gaussian random variables Xi \ and X^, each with mean [i and variance 
a 2 . The Gaussian functional form is well-motivated, since Inr is approximately normally distributed. X^i and 
are correlated with one another, with covariance matrix 

T 2 C 



E = 



C v 2 



(Dl) 



Proximity Effect and Mass of Quasar Host Halos 



23 



so that their joint PDF is given by 



/(x) 



27TIEI 1 / 



— exp[(x-/i)£ 1 (x-^) T ], 



(D2) 



where x= (2:1,2:2). Let C hest be the likelihood for n estimated from X^i and -^,2, ignoring the correlation between 
the two random variables, and C z,true be the "true" likelihood. Then: 



ln£ l 



f 



In [2™] - — [(a* - y) 2 + (x 2 - m) 2 ] 



and 



lnC' true = -In [27rcr(l-e 2 ) 1/2 ] 



f 



2cr 2 (l-e 2 ) 



[(xi - y) 2 - 2e(xi - ^(2:2 - n) + (x 2 - yf 



(D3) 
(D4) 



where e = (/a 2 . Given that (A^i, X;^) is distributed as in equation ID21 we may compute the expectation values of 
ln/> est and In/?'*™ 6 : 



i,est\ 



and 



(ln£ 



(ln/?>*™ e ) = - In [2trj 2 (1 - e 2 ) 1 / 2 ] 



In [27rcr 2 



1-e 2 



1 



1-e 2 



A/i 



(D5) 



(D6) 



where A/i is the difference between the true mean and the value at which the likelihood is evaluated. Since (ln.£ l ' est ) is 
maximum for A/i = 0, the maximum likelihood estimate for /1 is unbiased, even if the estimator ignores the correlations 
between the random variables. An estimate for the width of (ln£) is given by the curvature at the maximum likelihood 

value, ct(jC) = — [\/9 2 (ln£) /9(A/i) 2 ] -1 . This expression is exactly equal to the standard deviation of the likelihood 
function when the latter is Gaussian. Explicitly, 



V2 



and a(C> true ) = \/\T~e—j=. (D7) 

v2 



Let us now consider the total likelihood for y, as estimated from the N pairs (X^i, X^)- By independence, 

(D8) 

(D9) 
(D10) 

) = VTT7^J=. (Dii) 



so that 

Therefore, 
and 



>2N 





N 


L tot 






i=l 




N 


\nC tot 






i=l 


(ln£ tot ) 


= N(\nC), 


and 





'2N 



Thus, the width of the estimated likelihood, when correlations are ignored, is underestimated by a factor l/yl + e 
(w 1 — e/2 for e <C 1). In particular, the error on the width of the likelihood is independent of the number of 
independent pairs of correlated data. In the limit e — > (no correlation), a(C l ' est ) — > a(C l ' true ), and the standard 
result for the error on the mean is recovered. In the limit e — > 1 (the two data points from each pair are the same), 

again as expected, since there are half as many data independent data points as the number 
of points from which the mean is estimated. 
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